Plotly Workshop Tutorial

Welcome to CodeRATS's Plotly workshop! This consists of two parts:

Part 1a: Basics of python and plotly

Plotly makes figures in 4 steps:

  1. Making a "canvas" or "figure" to draw on
  2. Adds in your data points
  3. Customize your visual (colors, point symbols, axes names and titles, etc.)
  4. Annotate the visual

There are two versions of Plotly:

We will start off with using Plotly Express to make a scatter plot

Let's compare petal width (petal_width) to petal length (petal_length) in a scatter plot. Run the following code block, and notice that you made a scatter plot in two lines of code!

Try out different viewing methods:

Say you want to color these points by species. It is as simple as including the color= parameter into px.scater(...). plotly.express will automatically color the points according to the data under the species column in your dataframe, and include a legend.

Try:

Try some of the other plotly.express plots! View some of the basic options:

Part 1b: Advanced Plotly features with Plotly Graph Objects

Plotly express is great for quickly making a plot for exploring your data. Sometimes, you need a cleaner figure, or you just want to plot something more complicated. Plotly Graph Objects allows you to work directly with plotly figure components so you can easily customize.

Let's make a scatter plot again, but this time with plotly graph objects. We will be loosely following this tutorial: https://towardsdatascience.com/tutorial-on-building-professional-scatter-graphs-in-plotly-python-abe33923f557

Step 1: Initialize your figure and add data to your plot. To do this, we use add_trace(...). A trace is like a layer of data (or a graph object) to add to the figure. You can call fig.add_trace(...) multiple times to add multiple traces (say a scatter plot overlaying a bar plot). Other helper methods exist such as add_shape(...) and add_hline(...); we will get to those later.

For now, we will only add one trace, which will be a scatter plot. To make this, we will call go.Scatter(...) to make a scatter plot graph object. Then we will add this scatter graph object to our figure my_fig using my_fig.add_trace(...).

Note that the graph object does not take in a pandas dataframe (like in plotly express); instead the data is defined directly as an array (or a pandas Series). This is slightly inconvenient but also much more flexible. Also note that unlike plotly express, plotly graph objects do not generate a graph title or axis title. These need to be explicitly defined.

To do this, call update_layout(...). Check out the documentation at https://plotly.com/python/figure-labels/ and https://plotly.com/python/axes, and make the following updates:

  1. Add in graph title and axis titles with specified font sizes
  2. Adjust position of graph title
  3. Change the background color to white
  4. Change the line color of the axes from white to gray
  5. Change the color of all text on the graph to gray
  6. Change the color of the data points to a darker shade of blue, so they stand out more

All the CSS named colors can be found at https://developer.mozilla.org/en-US/docs/Web/CSS/color_value

Let's look at the underlying data structure of a graph_object Figure. Try examining my_fig.data and my_fig.layout

Was this what you expected? The plotly figures are built upon nested-dictionaries (and lists to store multiple traces) and can be inspected and modified just like normal dicts and lists. I don't recommend trying to create/modify a Figure from scratch; methods exist for a reason. But viewing the underlying data can be helpful to remember a attribute name or understand the current data state.

Next we want to distinguish the points by category. Plotly express does this automatically if you pass a column name to the color or symbol parameters. Using graph_objects, similarly to how you updated the marker_color above with a single value, you could pass a list instead, giving the desired color of each point (based on its category). Similar techniques could be used to change the marker size or symbol as well. However, adding in a separate trace for each category is usually easier to deal with when making other updates later on.

The approach here is to use a for loop. For each unique category, make a new trace (scatter plot graph object) with the corresponding data from your table and add it to your figure.

You will need to specify the colors to use to plot each category. For now, we will make a dictionary with the species as the key and the color (this time in hexcode) as the value. View more colors at https://htmlcolorcodes.com/.

Then, make the figure!

  1. Initialize figure
  2. Add traces
  3. Update layout

Maybe it would be easier to view graph if each species had its own plot. However, we still want to be able to compare them. A subplot will allow us to arrange multiple plots on the same figure. Check out this link before continuing: https://plotly.com/python/subplots/#subplots-with-shared-yaxes

Plot still looks a little busy with all the redundant axes labels. The legend is also redundant since we already have the data separated out. To make it look nicer, we will:

  1. Remove the legend
  2. Remove the duplicated axis titles
  3. Remove axes lines
  4. Ensure all subplots are displaying a consistent range for both axes

Lastly, let's add a few finishing touches:

  1. Add all data points to each subplot
  2. Add additional information to the hover labels and fix the format (read more here: https://plotly.com/python/hover-text-and-formatting/#customizing-hover-text-with-a-hovertemplate)
  3. Make the titles look better
  4. Add signature

Finally, let's save your figure! You can easily download your plot as a png by clicking on the camera icon on the top right. If you want your figure saved as an svg or pdf (or other image format), you will use the write_image(...) method. To utilize this functionality, you may need to install an additional dependency kaleido. There can some issues in getting this package to work, however. Let us know if you need help!

Part 2: BYOC

Now it's your turn! Keep playing around with the iris dataset and try other plots (we would especially recommend boxplots https://plotly.com/python/box-plots/ or heatmaps https://plotly.com/python/heatmaps/).

OR

Load in your own data you want to visualize.